All data

## DataFrame with 130 rows and 28 columns
##               NREADS  NALIGNED    RALIGN TOTAL_DUP    PRIMER INSERT_SZ
##            <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## SRR1275356  10554900   7555880   71.5862   58.4931 0.0217638       208
## SRR1274090    196162    182494   93.0323   14.5122 0.0366826       247
## SRR1275251   8524470   5858130   68.7213   65.0428 0.0351827       230
## SRR1275287   7229920   5891540   81.4884   49.7609 0.0208685       222
## SRR1275364   5403640   4482910   82.9609   66.5788 0.0298284       228
## ...              ...       ...       ...       ...       ...       ...
## SRR1275259   5949930   4181040   70.2705   52.5975 0.0205253       224
## SRR1275253  10319900   7458710   72.2747   54.9637 0.0205342       207
## SRR1275285   5300270   4276650   80.6873   41.6394 0.0227383       222
## SRR1275366   7701320   6373600   82.7600   68.9431 0.0266275       233
## SRR1275261  13425000   9554960   71.1727   62.0001 0.0200522       241
##            INSERT_SZ_STD COMPLEXITY     NDUPR PCT_RIBOSOMAL_BASES
##                <numeric>  <numeric> <numeric>           <numeric>
## SRR1275356            63   0.868928  0.343113               2e-06
## SRR1274090           133   0.997655  0.935730               0e+00
## SRR1275251            89   0.789252  0.201082               0e+00
## SRR1275287            78   0.898100  0.538191               0e+00
## SRR1275364            76   0.890693  0.391660               0e+00
## ...                  ...        ...       ...                 ...
## SRR1275259            80   0.898898  0.399189               5e-06
## SRR1275253            62   0.863618  0.344744               0e+00
## SRR1275285            76   0.920068  0.638765               2e-06
## SRR1275366            83   0.860359  0.343122               0e+00
## SRR1275261           105   0.806833  0.234551               0e+00
##            PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES
##                   <numeric>     <numeric>          <numeric>
## SRR1275356         0.125806      0.180954           0.613229
## SRR1274090         0.309822      0.412917           0.205185
## SRR1275251         0.398461      0.473884           0.039886
## SRR1275287         0.196420      0.227592           0.498944
## SRR1275364         0.138617      0.210406           0.543941
## ...                     ...           ...                ...
## SRR1275259         0.261384      0.383665           0.264250
## SRR1275253         0.110732      0.190036           0.606814
## SRR1275285         0.143667      0.231103           0.540070
## SRR1275366         0.215696      0.307817           0.409437
## SRR1275261         0.408881      0.391068           0.147748
##            PCT_INTERGENIC_BASES PCT_MRNA_BASES MEDIAN_CV_COVERAGE
##                       <numeric>      <numeric>          <numeric>
## SRR1275356             0.080008       0.306760           1.495770
## SRR1274090             0.072076       0.722739           1.007580
## SRR1275251             0.087770       0.872345           1.242990
## SRR1275287             0.077044       0.424013           0.775981
## SRR1275364             0.107035       0.349024           1.441370
## ...                         ...            ...                ...
## SRR1275259             0.090696       0.645049           1.101040
## SRR1275253             0.092418       0.300768           1.701690
## SRR1275285             0.085158       0.374770           0.714087
## SRR1275366             0.067050       0.523513           1.251980
## SRR1275261             0.052302       0.799949           0.939066
##            MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS
##                     <numeric>          <numeric>
## SRR1275356           0.000000           0.166122
## SRR1274090           0.181742           0.698991
## SRR1275251           0.000000           0.340046
## SRR1275287           0.010251           0.350915
## SRR1275364           0.000000           0.204074
## ...                       ...                ...
## SRR1275259           0.000000           0.315550
## SRR1275253           0.000000           0.106902
## SRR1275285           0.019578           0.419987
## SRR1275366           0.000000           0.281554
## SRR1275261           0.000292           0.290117
##            MEDIAN_5PRIME_TO_3PRIME_BIAS sample_id.x           Lane_ID
##                               <numeric> <character>       <character>
## SRR1275356                     1.036250   SRX534610 D24VYACXX130502:4
## SRR1274090                     0.293510   SRX534823                 1
## SRR1275251                     0.201518   SRX534623 D24VYACXX130502:4
## SRR1275287                     0.292838   SRX534641 D24VYACXX130502:1
## SRR1275364                     0.619863   SRX534614 D24VYACXX130502:7
## ...                                 ...         ...               ...
## SRR1275259                     0.350391   SRX534627 D24VYACXX130502:4
## SRR1275253                     0.944856   SRX534624 D24VYACXX130502:3
## SRR1275285                     0.194939   SRX534640 D24VYACXX130502:1
## SRR1275366                     0.388272   SRX534615 D24VYACXX130502:8
## SRR1275261                     0.384402   SRX534628 D24VYACXX130502:3
##            LibraryName avgLength     spots Biological_Condition
##            <character> <integer> <integer>          <character>
## SRR1275356      GW16_2       202   9818076                 GW16
## SRR1274090       NPC_9        60     95454                  NPC
## SRR1275251      GW16_8       202   7935952                 GW16
## SRR1275287    GW21+3_2       202   6531944               GW21+3
## SRR1275364     GW16_23       202   4919561                 GW16
## ...                ...       ...       ...                  ...
## SRR1275259      GW21_3       202   5528916                 GW21
## SRR1275253      GW16_9       202   9562204                 GW16
## SRR1275285   GW21+3_16       202   4860721               GW21+3
## SRR1275366     GW16_24       202   7153688                 GW16
## SRR1275261      GW21_4       202  12142387                 GW21
##            Coverage_Type Cluster1 Cluster2
##              <character> <factor> <factor>
## SRR1275356          High     IIIb      III
## SRR1274090           Low       1a        I
## SRR1275251          High       NA      III
## SRR1275287          High       1c        I
## SRR1275364          High     IIIb      III
## ...                  ...      ...      ...
## SRR1275259          High       NA      III
## SRR1275253          High     IIIb      III
## SRR1275285          High      Iva       IV
## SRR1275366          High       NA      III
## SRR1275261          High       II       II

Regular PCA

Weighted PCA (scone and scde)

Imputed data

There is clearly something wrong with this. I am not sure whether it’s because the weights are not correct or if it’s because the average is not correct (I suspect the latter, given that we are estimating a fixed mean in a heterogeneous population).

Correlation with sample quality

High coverage

Here, we will repeat the analysis for high-coverage only and low-coverage only data.